-
Notifications
You must be signed in to change notification settings - Fork 13.9k
unicode_data
refactors
#147622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
unicode_data
refactors
#147622
Conversation
If you want to modify |
2c5244e
to
90adbe2
Compare
90adbe2
to
1a646cf
Compare
This comment has been minimized.
This comment has been minimized.
} | ||
|
||
fn rustfmt(path: &str) { | ||
std::process::Command::new("rustfmt").arg(path).status().expect("rustfmt failed"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is rustfmt
really always in PATH
when this command is run? Otherwise, I think it'd be easier to slap a big #[rustfmt::skip]
on the mod unicode_data
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention is to keep the generated unicode_data.rs
readable without having to carefully construct well-formatted code in the metaprogram. For that, we need to rust rustfmt
1b56c98
to
dc0dcf5
Compare
This PR was rebased onto a different master commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers. |
This comment has been minimized.
This comment has been minimized.
Instead of `include_str!()`ing `range_search.rs`, just make it a normal module under `core::unicode`. This means the same source code doesn't have to be checked in twice, and it plays nicer with IDEs. Also rename it to `rt` since it includes functions for searching the bitsets and case conversion tables as well as the range represesentation.
Remove `#[rustfmt::skip]` from all the generated modules in `unicode_data.rs`. This means we won't have to worry so much about getting indetation and formatting right when generating code. Exempted for now some tables which would be too big when formatted by `rustfmt`.
This check was made redundant (it will always be true) when we removed all ASCII characters from the tables (rust-lang@a8c6694).
To make the final output code easier to see: * Get rid of the unnecessary line-noise of `.unwrap()`ing calls to `write!()` by moving the `.unwrap()` into a macro. * Join consecutive `write!()` calls using a single multiline format string. * Replace `.push()` and `.push_str(format!())` with `write!()`. * If after doing all of the above, there is only a single `write!()` call in the function, just construct the string directly with `format!()`.
dc0dcf5
to
d2c9773
Compare
This comment has been minimized.
This comment has been minimized.
Instead of generating a standalone executable to test `unicode_data`, generate normal tests in `coretests`. This ensures tests are always generated, and will be run as part of the normal testsuite. Also change the generated tests to loop over lookup tables, rather than generating a separate `assert_eq!()` statement for every codepoint. The old approach produced a massive (20,000 lines plus) file which took minutes to compile!
d2c9773
to
41d988f
Compare
Minor refactors to
unicode_data
that occured to me while trying to reduce the size of the tables. Splitting into a separate PR. NFC